40 research outputs found

    Evaluating Dependency Parsing Performance on German Learner Language

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili MĂŒĂŒrisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 175-186. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Short Answer Assessment in Context: The Role of Information Structure

    Get PDF
    Short Answer Assessment (SAA), the computational task of judging the appro- priateness of an answer to a question, has received much attention in recent years (cf., e.g., Dzikovska et al. 2013; Burrows et al. 2015). Most researchers have approached the problem as one similar to paraphrase recognition (cf., e.g., Brockett & Dolan 2005) or textual entailment (Dagan et al., 2006), where the answer to be evaluated is aligned to another available utterance, such as a target answer, in a sufficiently abstract way to capture form variation. While this is a reasonable strategy, it fails to take the explicit context of an answer into account: the question. In this thesis, we present an attempt to change this situation by investigating the role of Information Structure (IS, cf., e.g., Krifka 2007) in SAA. The basic assumption adapted from IS here will be that the content of a linguistic ex- pression is structured in a non-arbitrary way depending on its context (here: the question), and thus it is possible to predetermine to some extent which part of the expression’s content is relevant. In particular, we will adopt the Question Under Discussion (QUD) approach advanced by Roberts (2012) where the information structure of an answer is determined by an explicit or implicit question in the discourse. We proceed by first introducing the reader to the necessary prerequisites in chapters 2 and 3. Since this is a computational linguistics thesis which is inspired by theoretical linguistic research, we will provide an overview of relevant work in both areas, discussing SAA and Information Structure (IS) in sufficient detail, as well as existing attempts at annotating Information Structure in corpora. After providing the reader with enough background to understand the remainder of the thesis, we launch into a discussion of which IS notions and dimensions are most relevant to our goal. We compare the given/new distinction (information status) to the focus/background distinction and conclude that the latter is better suited to our needs, as it captures requested information, which can be either given or new in the context. In chapter 4, we introduce the empirical basis of this work, the Corpus of Reading Comprehension Exercises in German (CREG, Ott, Ziai & Meurers 2012). We outline how as a task-based corpus, CREG is particularly suited to the analysis of language in context, and how it thus forms the basis of our efforts in SAA and focus detection. Complementing this empirical basis, we present the SAA system CoMiC in chapter 5, which is used to integrate focus into SAA in chapter 8. Chapter 6 then delves into the creation of a gold standard for automatic focus detection. We describe what the desiderata for such a gold standard are and how a subset of the CREG corpus is chosen for manual focus annotation. Having determined these prerequisites, we proceed in detail to our novel annotation scheme for focus, and its intrinsic evaluation in terms of inter- annotator agreement. We also discuss explorations of using crowd-sourcing for focus annotation. After establishing the data basis, we turn to the task of automatic focus detection in short answers in chapter 7. We first define the computational task as classifying whether a given word of an answer is focused or not. We experiment with several groups of features and explain in detail the motivation for each: syntax and lexis of the question and the the answer, positional features and givenness features, taking into account both question and answer properties. Using the adjudicated gold standard we established in chapter 6, we show that focus can be detected robustly using these features in a word-based classifier in comparison to several baselines. In chapter 8, we describe the integration of focus information into SAA, which is both an extrinsic testbed for focus annotation and detection per se and the computational task we originally set out to advance. We show that there are several possible ways of integrating focus information into an alignment- based SAA system, and discuss each one’s advantages and disadvantages. We also experiment with using focus vs. using givenness in alignment before concluding that a combination of both yields superior overall performance. Finally, chapter 9 presents a summary of our main research findings along with the contributions of this thesis. We conclude that analyzing focus in authentic data is not only possible but necessary for a) developing context- aware SAA approaches and b) grounding and testing linguistic theory. We give an outlook on where future research needs to go and what particular avenues could be explored.Short Answer Assessment (SAA), die computerlinguistische Aufgabe mit dem Ziel, die Angemessenheit einer Antwort auf eine Frage zu bewerten, ist in den letzten Jahren viel untersucht worden (siehe z.B. Dzikovska et al. 2013; Burrows et al. 2015). Meist wird das Problem analog zur Paraphrase Recognition (siehe z.B. Brockett & Dolan 2005) oder zum Textual Entailment (Dagan et al., 2006) behandelt, indem die zu bewertende Antwort mit einer Referenzantwort verglichen wird. Dies ist prinzipiell ein sinnvoller Ansatz, der jedoch den expliziten Kontext einer Antwort außer Acht lĂ€sst: die Frage. In der vorliegenden Arbeit wird ein Ansatz dargestellt, diesen Stand der Forschung zu Ă€ndern, indem die Rolle der Informationsstruktur (IS, siehe z.B. Krifka 2007) im SAA untersucht wird. Der Ansatz basiert auf der grundlegen- den Annahme der IS, dass der Inhalt eines sprachlichen Ausdrucks auf einer bestimmte Art und Weise durch seinen Kontext (hier: die Frage) strukturiert wird, und dass man daher bis zu einem gewissen Grad vorhersagen kann, welcher inhaltliche Teil des Ausdrucks relevant ist. Insbesondere wird der Question Under Discussion (QUD) Ansatz (Roberts, 2012) ĂŒbernommen, bei dem die Informationsstruktur einer Antwort durch eine explizite oder implizite Frage im Diskurs bestimmt wird. In Kapitel 2 und 3 wird der Leser zunĂ€chst in die relevanten wissenschaft- lichen Bereiche dieser Dissertation eingefĂŒhrt. Da es sich um eine compu- terlinguistische Arbeit handelt, die von theoretisch-linguistischer Forschung inspiriert ist, werden sowohl SAA als auch IS in fĂŒr die Arbeit ausreichender Tiefe diskutiert, sowie ein Überblick ĂŒber aktuelle AnsĂ€tze zur Annotation von IS-Kategorien gegeben. Anschließend wird erörtert, welche Begriffe und Unterscheidungen der IS fĂŒr die Ziele dieser Arbeit zentral sind: Ein Vergleich der given/new-Unterscheidung und der focus/background-Unterscheidung ergibt, dass letztere das relevantere Kriterium darstellt, da sie erfragte Information erfasst, welche im Kontext sowohl gegeben als auch neu sein kann. Kapitel 4 stellt die empirische Basis dieser Arbeit vor, den Corpus of Reading Comprehension Exercises in German (CREG, Ott, Ziai & Meurers 2012). Es wird herausgearbeitet, warum ein task-basiertes Korpus wie CREG besonders geeignet fĂŒr die linguistische Analyse von Sprache im Kontext ist, und dass es daher die Basis fĂŒr die in dieser Arbeit dargestellten Untersuchungen zu SAA und zur Fokusanalyse darstellt. Kapitel 5 prĂ€sentiert das SAA-System CoMiC (Meurers, Ziai, Ott & Kopp, 2011b), welches fĂŒr die Integration von Fokus in SAA in Kapitel 8 verwendet wird. Kapitel 6 befasst sich mit der Annotation eines Korpus mit dem Ziel der manuellen und automatischen Fokusanalyse. Es wird diskutiert, auf welchen Kriterien ein Ansatz zur Annotation von Fokus sinnvoll aufbauen kann, bevor ein neues Annotationsschema prĂ€sentiert und auf einen Teil von CREG ange- wendet wird. Der Annotationsansatz wird erfolgreich intrinsisch validiert, und neben Expertenannotation wird außerdem ein Crowdsourcing-Experiment zur Fokusannotation beschrieben. Nachdem die Datengrundlage etabliert wurde, wendet sich Kapitel 7 der automatischen Fokuserkennung in Antworten zu. Nach einem Überblick ĂŒber bisherige Arbeiten wird zunĂ€chst diskutiert, welche relevanten Eigenschaften von Fragen und Antworten in einem automatischen Ansatz verwendet werden können. Darauf folgt die Beschreibung eines wortbasierten Modells zur Foku- serkennung, welches Merkmale der Syntax und Lexis von Frage und Antwort einbezieht und mehrere Baselines in der Genauigkeit der Klassifikation klar ĂŒbertrifft. In Kapitel 8 wird die Integration von Fokusinformation in SAA anhand des CoMiC-Systems dargestellt, welche sowohl als extrinsische Validierung von manueller und automatischer Fokusanalyse dient, als auch die computerlin- guistische Aufgabe darstellt, zu der diese Arbeit einen Beitrag leistet. Fokus wird als Filter fĂŒr die Zuordnung von Lerner- und Musterantworten in CoMiC integriert und diese Konfiguration wird benutzt, um den Einfluss von manu- eller und automatischer Fokusannotation zu untersuchen, was zu positiven Ergebnissen fĂŒhrt. Es wird außerdem gezeigt, dass eine Kombination von Fokus und Givenness bei verlĂ€sslicher Fokusinformation fĂŒr bessere Ergebnisse sorgt als jede Kategorie in Isolation erreichen kann. Schließlich gibt Kapitel 9 nochmals einen Überblick ĂŒber den Inhalt der Arbeit und stellt die HauptbeitrĂ€ge heraus. Die Schlussfolgerung ist, dass Fokusanalyse in authentischen Daten sowohl möglich als auch notwendig ist, um a) den Kontext in SAA einzubeziehen und b) linguistische Theorien zu IS zu validieren und zu testen. Basierend auf den Ergebnissen werden mehrere mögliche Richtungen fĂŒr zukĂŒnftige Forschung aufgezeigt

    CoMiC: Exploring Text Segmentation and Similarity in the English Entrance Exams Task. CLEF

    Get PDF
    Abstract. This paper describes our contribution to the English Entrance Exams task of CLEF 2015, which requires participating systems to automatically solve multiple choice reading comprehension tasks. We use a combination of text segmentation and different similarity measures with the aim of exploiting two observed aspects of tests: 1) the often linear relationship between reading text and test questions and 2) the differences in linguistic encoding of content in distractor answers vs. the correct answer. Using features based on these characteristics, we train a ranking SVM in order to learn answer preferences. In the official 2015 competition we achieve a c@1 score of 0.29, a medium but encouraging result. We identify two main issues that pave the way towards further research

    Learning what the crowd can do: A case study on focus annotation

    Get PDF
    This paper addresses the question of how to explore and advance the conceptualization and applicability of information structural notions to support the analysis of authentic data. With this we aim at further establishing where advances in linguistic modeling also result in quantifiable gains in real-life tasks. Can, for example, computational linguistic applications be improved by integrating information structural notions? One of the necessary prerequisites for answering this question are large enough sets of data which are annotated with the relevant information structural concepts. The main problem here is that notions like focus are often discussed in theoretic literature by means of example sentences but rarely analyzed in substantial amounts of authentic data

    Demographic, clinical and antibody characteristics of patients with digital ulcers in systemic sclerosis: data from the DUO Registry

    Get PDF
    OBJECTIVES: The Digital Ulcers Outcome (DUO) Registry was designed to describe the clinical and antibody characteristics, disease course and outcomes of patients with digital ulcers associated with systemic sclerosis (SSc). METHODS: The DUO Registry is a European, prospective, multicentre, observational, registry of SSc patients with ongoing digital ulcer disease, irrespective of treatment regimen. Data collected included demographics, SSc duration, SSc subset, internal organ manifestations, autoantibodies, previous and ongoing interventions and complications related to digital ulcers. RESULTS: Up to 19 November 2010 a total of 2439 patients had enrolled into the registry. Most were classified as either limited cutaneous SSc (lcSSc; 52.2%) or diffuse cutaneous SSc (dcSSc; 36.9%). Digital ulcers developed earlier in patients with dcSSc compared with lcSSc. Almost all patients (95.7%) tested positive for antinuclear antibodies, 45.2% for anti-scleroderma-70 and 43.6% for anticentromere antibodies (ACA). The first digital ulcer in the anti-scleroderma-70-positive patient cohort occurred approximately 5 years earlier than the ACA-positive patient group. CONCLUSIONS: This study provides data from a large cohort of SSc patients with a history of digital ulcers. The early occurrence and high frequency of digital ulcer complications are especially seen in patients with dcSSc and/or anti-scleroderma-70 antibodies
    corecore